Multivariate and Multi-stage Mixed Models

Author

Marnin Wolfe

Previously: More Complex Mixed Models

Lesson plan

Our goal today is to fill out our capabilities with asreml / mixed-models.

Several additional layers of modelling options in asreml:

  1. Spatial autoregressive models or Residuals
  2. Estimating genetic correlations with Multi-trait models
  3. Two-stage analysis (discussion only, hands-on later) for downstream GWAS/GP

Part I: Spatial and Multi-trait

Spatial analysis

Often, the exact row-column design (spatial layout) of field trials varies from year-to-year.

We already talked about some capacity of mixed-models to account for heterogeneous error variance among trials.

However, real fields can present gradients or other patterns that are not foreseeable when designing experiments. For example, variation in soil water holding capacity or pH, nutrients, etc. Those factors cause spatial correlation patterns between the errors that may not be accounted for by the experimental blocking effects.

If unaccounted for, these kinds of phenomena will bias our conclusions about the genetic units in the experiment.

An example of spatial variation within a field. Likely an error during pre-emergence herbicide resulted in the visible pattern. Unfortunately, the complete blocks in this experimental design run perpendicular (left-right) to the weed pressure. Correcting the residuals will be crucial to a fair evaluation of the clovers in this nursery.

So, what are our options?

IID Residuals

Option 1: By default we model errors as independent and identically distributed: ϵN(0,𝐈σe2)\epsilon \sim N(0,\textbf{I}\sigma^2_e). Basically, ignore variation within field.

For this model, the residual structure, with regards to the field dimensions looks like this:

Option 2: We can include main-effects in the model for range and row (field dimensions), as random post-hoc blocking factors.

Spatial Autoregressive Models

Option 3 and our focus for today: In this approach, a correlation matrix is applied to the residuals such that errors are correlated with the degree of correlation proportional to spatial proximity. Autocorrelation can be applied to one or both spatial dimensions. The R matrix should look something like this:

R=σe2Σc(ρc)Σr(ρr) R = \sigma^2_e\Sigma_c(\rho_c) \otimes \Sigma_r(\rho_r)

Σc(ρc)\Sigma_c(\rho_c) is a correlation matrix for the columns

Σr(ρr)\Sigma_r(\rho_r) is a correlation matrix for the rows

They are square and symmetric matrices (i.e. row-by-row and col-by-col).

Row matrix has dimension r×rr \times r. Column matrix has c×cc \times c.

ρc\rho_c and ρr\rho_r are auto-correlation parameters that get estimated when we fit the model with REML. The exponents on ρ\rho cause correlation to drop-off to zero with distance between plots.

In Isik, Holland, and Maltecca (2017) (Chapter 7), the example of a 3 row by 4 column design is given:

An “AR1 x AR1” model means a model with autoregressive correlation structure for both the ROW and COLUMN effects.

Model selection procedures include LRTs and AIC/BIC should be used to decide whether and which spatial approach to use.

Example

library(tidyverse)
library(asreml)
Online License checked out Mon Feb  2 11:50:40 2026
alt<-read.csv(here::here("data","19-25CC_ALT_DATA_ALL_2025-08-23.csv"))

# My example data chunk
# This time expanding to include 2 sites, 2 years
# Still small, but suits my examples
# Keep using your own!
alny_alt<-alt %>% 
  filter(site %in% c("AL","NY"),
         year %in% c(24,25))

Plot spatial patterns

Let’s look for some spatial patterns to model

alny_alt %>% 
  group_by(site,site.year) %>% 
  summarize(range=max(range),
            row=max(row))
`summarise()` has grouped output by 'site'. You can override using the
`.groups` argument.
# A tibble: 4 × 4
# Groups:   site [2]
  site  site.year range   row
  <chr> <chr>     <int> <int>
1 AL    24AL          8     7
2 AL    25AL          5    12
3 NY    24NY          6    10
4 NY    25NY          6    12

4 site.years

4 different layouts

alny_alt %>% 
  ggplot(.,aes(x=site.year,fill=rep,y=biomass.1)) + geom_boxplot()
Warning: The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

Let’s make some plots. Do you see any trends or patterns across the spatial dimensions?

alny_alt %>% mutate(range=as.character(range)) %>% 
  ggplot(.,aes(x=range,fill=range,y=biomass.1)) + geom_boxplot() + facet_grid(site+year~.,scales='free')

alny_alt %>% mutate(row=factor(row,levels=1:13)) %>% 
  ggplot(.,aes(x=row,fill=row,y=biomass.1)) + geom_boxplot() + facet_grid(site+year~.,scales='free')

My attempt to make a heatmap of the spatial variation in biomass from one site.year:

alny_alt %>% 
  filter(site.year=="24NY") %>% 
  select(row,range,biomass.1) %>% 
  spread(range,biomass.1) %>% as.matrix %>% 
  image(.)

Pick an single trial

ny24_alt<-alny_alt %>% 
  filter(site.year=="24NY")
ny24_alt<-ny24_alt %>% 
# Make factors before modeling
  mutate(entry=as.factor(entry),
         rep=as.factor(paste0(site.year,"-",rep)),
         row=as.factor(row),
         range=as.factor(range)) %>% 
  arrange(range,row)

For our current purpose, let’s analyze just one site-year.

iid_error <- asreml(biomass.1~rep, 
                    random=~entry,
                    data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -115.3087      15.93264     58   11:50:41
 2     -110.9788      10.87602     58   11:50:41
 3     -107.8596      7.527075     58   11:50:41
 4     -106.9027      6.132323     58   11:50:41
 5     -106.7701      5.607674     58   11:50:41
 6     -106.7693      5.566663     58   11:50:41
summary(iid_error)$varcomp
        component std.error  z.ratio bound %ch
entry   13.002783  4.257152 3.054339     P   0
units!R  5.566663  1.435833 3.876958     P   0

Row/Range as random

row_random <- asreml(biomass.1~rep, 
                     random=~entry+row,
                     data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -116.2221      15.36010     58   11:50:41
 2     -111.1389      10.21223     58   11:50:41
 3     -107.5665      6.758174     58   11:50:41
 4     -106.4198      5.267610     58   11:50:41
 5     -106.2339      4.669468     58   11:50:41
 6     -106.2314      4.595406     58   11:50:41
# summary(row_random)$varcomp
lucid::vc(row_random)
  effect component std.error z.ratio bound %ch
     row    0.8572     1.047    0.82     P 0.5
   entry   13.63       4.34     3.1      P 0.1
 units!R    4.595      1.376    3.3      P 0  
lrt(iid_error, row_random)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                     df LR-statistic Pr(Chisq)
row_random/iid_error  1       1.0759    0.1498
rng_random <- asreml(biomass.1~rep, 
                     random=~entry+range,
                     data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -115.1688      15.15870     58   11:50:41
 2     -110.6454      10.07005     58   11:50:41
 3     -107.4475      6.824721     58   11:50:41
 4     -106.4917      5.568092     58   11:50:41
 5     -106.3596      5.138133     58   11:50:41
 6     -106.3581      5.121793     58   11:50:41
Warning in asreml(biomass.1 ~ rep, random = ~entry + range, data = ny24_alt):
Some components changed by more than 1% on the last iteration
# summary(rng_random)$varcomp
lucid::vc(rng_random)
  effect component std.error z.ratio bound %ch
   range     1.024     1.485    0.69     P 1.2
   entry    12.42      4.071    3.1      P 0  
 units!R     5.122     1.391    3.7      P 0  
lrt(iid_error, rng_random)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                     df LR-statistic Pr(Chisq)
rng_random/iid_error  1      0.82235    0.1822
# this should be identical to the "iid" residuals model, just allow me to plot residual variogram
iid_error1<-asreml(biomass.1~rep, 
                   random=~entry,
                   residual=~id(range):id(row),
                   data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -115.3087      15.93264     58   11:50:41
 2     -110.9788      10.87602     58   11:50:41
 3     -107.8596      7.527075     58   11:50:41
 4     -106.9027      6.132323     58   11:50:41
 5     -106.7701      5.607674     58   11:50:41
 6     -106.7693      5.566663     58   11:50:41
plot(varioGram(iid_error1))

AR1

ar1_rng <- asreml(biomass.1~rep, 
                  random=~entry,
                  residual=~ar1(range):id(row),
                  data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -116.0934      16.55381     58   11:50:41
 2     -110.8165      10.75185     58   11:50:41
 3     -106.7631      7.113229     58   11:50:41
 4     -105.5400      5.790797     58   11:50:41
 5     -105.4022      5.333902     58   11:50:41
 6     -105.4018      5.316984     58   11:50:41
plot(varioGram(ar1_rng))

lrt(iid_error1, ar1_rng)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                   df LR-statistic Pr(Chisq)  
ar1_rng/iid_error1  1       2.7351   0.04908 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(iid_error1)$aic
[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_rng)$aic
[1] 216.8035
attr(,"parameters")
[1] 3
lucid::vc(ar1_rng)
              effect component std.error z.ratio bound %ch
               entry   14.08      4.343      3.2     P   0
         range:row!R    5.317     1.414      3.8     P   0
 range:row!range!cor    0.3668    0.1953     1.9     U   0

Genetic variance went up!

ar1_row <- asreml(biomass.1~rep, 
                  random=~entry,
                  residual=~id(range):ar1(row),
                  data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -115.1729      16.07438     58   11:50:41
 2     -110.1702      10.71814     58   11:50:41
 3     -106.7327      7.515932     58   11:50:41
 4     -105.8954      6.338057     58   11:50:41
 5     -105.8080      5.905261     58   11:50:41
 6     -105.8078      5.876783     58   11:50:41
plot(varioGram(ar1_row))

lrt(iid_error1, ar1_row)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                   df LR-statistic Pr(Chisq)  
ar1_row/iid_error1  1       1.9231   0.08276 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(iid_error1)$aic
[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_row)$aic
[1] 217.6155
attr(,"parameters")
[1] 3
lucid::vc(ar1_row)
            effect component std.error z.ratio bound %ch
             entry   12.64      4.049      3.1     P 0  
       range:row!R    5.877     1.629      3.6     P 0  
 range:row!row!cor    0.3326    0.2124     1.6     U 0.1
ar1_rngrow <- asreml(biomass.1~rep, 
                     random=~entry,
                     residual=~ar1(range):ar1(row),
                     data=ny24_alt)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -115.9548      16.69992     58   11:50:41
 2     -109.7624      10.53414     58   11:50:41
 3     -105.2070      7.207152     58   11:50:41
 4     -104.3140      6.167376     58   11:50:41
 5     -104.2257      5.754424     58   11:50:41
 6     -104.2252      5.727410     58   11:50:41
plot(varioGram(ar1_rngrow))

lrt(iid_error1, ar1_rngrow)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                      df LR-statistic Pr(Chisq)  
ar1_rngrow/iid_error1  2       5.0881   0.03168 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(iid_error1)$aic
[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_rngrow)$aic
[1] 216.4504
attr(,"parameters")
[1] 4
lucid::vc(ar1_rngrow)
              effect component std.error z.ratio bound %ch
               entry   13.67      4.159      3.3     P 0.1
         range:row!R    5.727     1.733      3.3     P 0  
 range:row!range!cor    0.3867    0.1894     2       U 0  
   range:row!row!cor    0.3624    0.2175     1.7     U 0.1
summary(iid_error1)$aic
[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_rngrow)$aic
[1] 216.4504
attr(,"parameters")
[1] 4
summary(ar1_rng)$aic
[1] 216.8035
attr(,"parameters")
[1] 3

Add a “nugget”

The units term is sometimes called the “nugget effect” or the “measurement error”.

Add a plot-level (one level for each experimental unit) random-effect to the model in addition to the spatial residual term.

Separates residual effects into two parts. portion independent among plots and another part for correlated errors associated with spatial distance.

The correlated-effects might be b/c soil fertility, water holding capacity, etc.

The measurement error (nugget) due might be due to ‘permanent environment’ or field position effects; something constant, like having a cow sit on a particular plot.

ar1_rngrow_nugget <- asreml(biomass.1~rep, 
                            random=~entry+idv(units),
                            residual=~ar1(range):ar1(row),
                            data=ny24_alt, maxit=90)
ASReml Version 4.2 02/02/2026 11:50:41
          LogLik        Sigma2     DF     wall
 1     -116.0673      15.30780     58   11:50:41  (  1 restrained)
 2     -111.4385      11.90274     58   11:50:41  (  1 restrained)
 3     -106.3191      7.968799     58   11:50:41  (  1 restrained)
 4     -104.4891      6.477581     58   11:50:41  (  1 restrained)
 5     -104.2402      5.901006     58   11:50:41  (  1 restrained)
 6     -104.2244      5.699143     58   11:50:41  (  1 restrained)
 7     -104.2171      5.208599     58   11:50:41
 8     -104.2012      4.864135     58   11:50:41
 9     -104.1918      4.682068     58   11:50:41
10     -104.1867      4.608293     58   11:50:41
11     -104.1842      4.592215     58   11:50:41
12     -104.1829      4.596334     58   11:50:41
13     -104.1822      4.606527     58   11:50:41
14     -104.1817      4.617851     58   11:50:41
15     -104.1815      4.628659     58   11:50:42
lrt(iid_error1, ar1_rngrow_nugget)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                             df LR-statistic Pr(Chisq)  
ar1_rngrow_nugget/iid_error1  3       5.1756   0.05671 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lrt(ar1_rngrow, ar1_rngrow_nugget)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                             df LR-statistic Pr(Chisq)
ar1_rngrow_nugget/ar1_rngrow  1     0.087489    0.3837
summary(ar1_rngrow)$aic
[1] 216.4504
attr(,"parameters")
[1] 4
summary(ar1_rngrow_nugget)$aic
[1] 218.363
attr(,"parameters")
[1] 5
summary(ar1_rngrow_nugget)$varcomp
                     component std.error   z.ratio bound %ch
entry               13.5962748 4.1745145 3.2569716     P 0.2
units!units          1.6728314 2.2348234 0.7485296     P 0.8
range:row!R          4.6286587 2.7798010 1.6651043     P 0.0
range:row!range!cor  0.5803943 0.3654835 1.5880180     U 0.5
range:row!row!cor    0.5908623 0.4029625 1.4662961     U 0.9

Adding the “nugget” made a poorer fit model.

Has better application with long-term perennial plots, for example.

Heritability and spatial models

Spatially correlated residual model introduces some complications for estimating heritability.

AR models may provide a better fit to data but can inflate the residual variance.

For “traditional” variance component-based heritability estimates, asreml provides a handy function called vpredict().

# Handy asreml function: vpredict 
## Calculates functions of the variance components and their std. errors
#iid_error$vparameters
vpredict(iid_error,H2~V1/(V1+V2))
    Estimate         SE
H2 0.7002246 0.09417944
# Handy asreml function: vpredict 
## Calculates functions of the variance components and their std. errors
vpredict(ar1_rngrow,H2~V1/(V1+V2+V3))
    Estimate         SE
H2 0.6909303 0.09436592

The alternative is 1PEVσg21-\frac{\bar{PEV}}{\sigma^2_g}, the mean reliability (see previous lesson).

# extract BLUPs and make them into a data.frame
blups<-summary(ar1_rngrow,coef=T)$coef.random %>% 
  as.data.frame %>% 
  rownames_to_column(var = "entry") %>% 
  filter(grepl("^entry_",entry)) %>% 
  mutate(entry=gsub("entry_","",entry,fixed = T))
# extract Vg (variance component associated with "entry" effect)
sigma2g<-summary(ar1_rngrow)$varcomp["entry","component"]
# Compute PEV and REL for each BLUP
blups<-blups %>% 
  mutate(PEV=std.error^2,
         REL=1-(PEV/sigma2g))
# Mean REL is our estimate of heritability:
mean(blups$REL)
[1] 0.8437074

Multi-trial

You can use the dsum() function like we did previously, along with auto-regressive structures. This allows to model each trial’s spatial error separately.

# Make factors before modeling
alny_alt<-alny_alt %>% 
  mutate(entry=as.factor(entry),
         site.year=as.factor(site.year),
         site=as.factor(site),
         # unless you are expecting / modeling a linear trend in year
         # convert it to factor
         year=as.factor(year),
         # explicitly nest values of rep in site.year
         # review explicit vs. implicit nesting
         rep=as.factor(paste0(site.year,"-",rep)),
         row=as.factor(row),
         range=as.factor(range)) %>% 
  # ORDER IS IMPORTANT HERE
  arrange(site.year,range,row)

ar1_alny <- asreml(biomass.1~1+site.year, 
                     random=~entry+rep,
                     residual=~dsum(~ar1(range):ar1(row)|site.year),
                     data=alny_alt, maxiter=90)
ASReml Version 4.2 02/02/2026 11:50:42
          LogLik        Sigma2     DF     wall
 1     -635.6699           1.0    244   11:50:42  (  2 restrained)
 2     -613.1544           1.0    244   11:50:42
 3     -600.7582           1.0    244   11:50:42
 4     -594.2571           1.0    244   11:50:42
 5     -591.6365           1.0    244   11:50:42  (  1 restrained)
 6     -590.8221           1.0    244   11:50:42  (  1 restrained)
 7     -590.6447           1.0    244   11:50:42  (  1 restrained)
 8     -590.6117           1.0    244   11:50:42  (  1 restrained)
 9     -590.6052           1.0    244   11:50:42
10     -590.6039           1.0    244   11:50:42
11     -590.6036           1.0    244   11:50:42
Warning in asreml(biomass.1 ~ 1 + site.year, random = ~entry + rep, residual =
~dsum(~ar1(range):ar1(row) | : Some components changed by more than 1% on the
last iteration
plot(varioGram(ar1_alny))

summary(ar1_alny)$varcomp
                             component   std.error       z.ratio bound   %ch
rep                       5.530377e-06          NA            NA     B    NA
entry                     2.279569e+00  1.06556154  2.1393123060     P   0.2
site.year_24AL!R          2.424695e+02 57.11406757  4.2453547994     P   0.0
site.year_24AL!range!cor  4.045302e-01  0.13000414  3.1116719832     U   0.0
site.year_24AL!row!cor    3.286445e-01  0.13785021  2.3840692566     U   0.0
site.year_24NY!R          1.418477e+01  3.02084593  4.6956268241     P   0.0
site.year_24NY!range!cor -1.492073e-01  0.15674867 -0.9518887772     U   0.1
site.year_24NY!row!cor    5.768768e-05  0.15138292  0.0003810712     U 366.8
site.year_25AL!R          1.967632e+02 37.65342648  5.2256384660     P   0.0
site.year_25AL!range!cor  1.447034e-03  0.15382791  0.0094068388     U   4.6
site.year_25AL!row!cor    1.661224e-01  0.13217975  1.2567918090     U   0.0
site.year_25NY!R          1.966660e+01  6.66496577  2.9507428747     P   0.2
site.year_25NY!range!cor  6.560056e-01  0.11568750  5.6704967625     U   0.1
site.year_25NY!row!cor    6.715315e-01  0.09137516  7.3491692384     U   0.1

Alternatives to AR models

If your design isn’t rectangular or there is significant missingness in the data, the autoregressive model can pose problems.

An alternative is to use splines as described by Rodríguez-Álvarez et al. (2018). This can be implemented using the sommer mixed-model R package or the SpATs packages. Splines are less susceptible to missing data and also don’t mind non-rectangular designs.

Multi-trait models

Background

Recommend reading Isik, Holland, and Maltecca (2017) - Chapter 6.

Multiple responses variables (traits) can be jointly analyzed with a mixed-model. Such models enable the estimation of genetic correlations (genetic variance-covariance) between traits.

What is genetic correlation?

The tendency of different phenotypes to be co-inherited.

Trait variances can be decomposed into genetic and environmental components. So can covariances.

Phenotypic correlation due to genetic + environmental sources

Genetic variance components for two traits x and y. A = Additive; D = Dominance; AA, AD and DD = Epistasic components

Additive genetic correlation.

Where do genetic correlations come from?

  • Genetically speaking: plieotropy (one allele influences multiple traits) and linkage (alleles at different loci are co-inherited)
  • Population processes:
    • Selection on a suite of traits (correlational selection)
    • Assortative mating (multi-trait mate preferences)
    • Gene flow / migration (mixing of different populations)
    • Maternal / Paternal effects (parental genotype influences offspring phenotype)
    • Genetic drift

Why do we care? What are genetic correlations “good for”?

Genetic correlations influence the direction of selection response and can either facilitate or else constrain evolutionary change in a population.

R=Gb R = Gb

This is one version of what we call the “Breeder’s Equation” (Hill and Mackay 2004; Lande and Arnold 1983; Animal Breeding Plans.Jay L. Lush” 1946).

  • R is a column vector, responses in each character (across generations) to selection
  • b is a column vector of selection gradients (depicting the strength and direction of selection on one or more traits.
  • G is a matrix of (additive) genetic variance-covariance between traits.

In short, select response tends in the direction of genetic correlations, even if the direction of selection is opposed to the correlation. Example: if seed size and seed number or negatively genetically correlated, selection for larger seed size and number will proceed very slowly.

Practically, genetic correlations can be used to predict the effectiveness of indirect selections on one trait.

Statistical / Analytic Benefits:

Multivariate mixed-models for low heritability traits (like grain yield or biomass) typically have low selection accuracy. When there are correlated traits, potentially that are more precisely measurable and/or have higher heritability, joint multivariate modeling can increase our accuracy. By exploiting their covariances with higher heritability traits (such as size and morphology), yield or lower heritability traits can be better evaluated.

Multivariate models can accommodate the implicit structure of the selection process.

For example, in open-pollinated clover breeding, culling selection is used to remove poor performing plants before they flower. Early on, we score traits on ALL plants. Later, after we apply selection, when we record trait data we are missing data on the culled plants. In other words, the data are not missing at random but are missing more frequently for later-recorded traits.

We’ll circle back to benefits and use cases as we talk more about plant breeding and quantitative genetics.

For now, let’s look at two ways to estimate genetic variance-covariance between traits in asreml.


Consider the following equation for a mixed linear model:

y=Xb+Zu+ey = Xb + Zu + e

var(y)=V=ZGZ+Rvar(y)=V = ZGZ'+ R

The default variance of random effects is 𝐺=𝐼σu2𝐺=𝐼\sigma^2_u and of residuals is 𝑅=Iσe2𝑅=I\sigma^2_e.

However, there are other variance-covariance functions available in asreml that can be tested. Each of these breaks the assumption of independence between components. For example:

Model Number Parameter Description asreml-R
Identity 1 Identical variation id
Diagonal M Heterogeneous variations diag
CS 2 Compound symmetry with homogeneous variance interation model
CS Het M+1 Compound symmetry with heterogeneous variance corh
AR1 M+1 First order autoregressive model ar1
Unstructured M(M+1)/2 Unstructured model us

Graphically:

In a graphical representation, we will assume 4 genotypes that were evaluated in four different locations. Credit: Ferrao

The multivatiate mixed-model can be written by stacking univariate mixed-models on top of each other and adding covariance parameters to-be-estimated to the matrices G and R.

y1=X1b+Z1u1+e1 y_1 = X_1b + Z_1u_1 + e_1

y2=X2b+Z2u2+e2 y_2 = X_2b + Z_2u_2 + e_2

[y1y2]=[X100X2][b1b2]+[Z100Z2][u1u2]+[e1e2] \begin{bmatrix} {y_1} \\ {y_2} \end{bmatrix} = \begin{bmatrix} {X_1} & {0}\\ {0} & {X_2} \end{bmatrix} \begin{bmatrix} {b_1} \\ {b_2} \end{bmatrix} + \begin{bmatrix} {Z_1} & {0}\\ {0} & {Z_2} \end{bmatrix} \begin{bmatrix} {u_1} \\ {u_2} \end{bmatrix} + \begin{bmatrix} {e_1} \\ {e_2} \end{bmatrix}

  • 𝑦𝑖𝑦_𝑖 = vector of observations for the ith trait;

  • 𝑏𝑖𝑏_𝑖 = vector of fixed effects for the ith trait;

  • u𝑖u_𝑖 = vector of random genetic (plant-genotype-level) effects for the ith trait;

  • 𝑒𝑖𝑒_𝑖 = vector of random residual effects for the ith trait;

  • 𝑋𝑖𝑋_𝑖 and 𝑍𝑖𝑍_𝑖 = incidence matrices relating records of the ith trait to fixed and random animal effects.

Below, you can see what the V matrix looks like in this simple case of two traits. Notice the terms g21g_{21} and g12g_{12} plus r21r_{21} and r12r_{12}. These represent genetic and residual covariances that will be estimated when the model is fit. Univariate models simply assume dthese covariances = 0.

var[u1u2e1e2]=V=[g11g1200g21g220000r11r1200r21r22] var\begin{bmatrix} {u_1} \\ {u_2} \\ {e_1} \\ {e_2} \end{bmatrix} = V = \begin{bmatrix} {g_{11}} & {g_{12}} & {0} &{0} \\ {g_{21}} & {g_{22}} & {0} &{0} \\ 0 & 0 & {r_{11}} &{r_{12}} \\ 0 & 0 & {r_{21}} &{r_{22}} \\ \end{bmatrix} G=[σu12σu122σu122σu22]I;R=I[σe12σe122σe122σe22] G = \begin{bmatrix} {\sigma^2_{u_1}} & {\sigma^2_{u_{12}}} \\ {\sigma^2_{u_{12}}} & {\sigma^2_{u_{2}}} \end{bmatrix} \otimes I; R = I \otimes \begin{bmatrix} {\sigma^2_{e_1}} & {\sigma^2_{e_{12}}} \\ {\sigma^2_{e_{12}}} & {\sigma^2_{e_{2}}} \end{bmatrix}

Let’s look at raw phenotypic correlations:

alny_alt %>% 
  select(biomass.1,fall.vigor.av,spring.vigor.av) %>% 
  as.matrix %>% 
  cor(., use='pairwise.complete.obs') %>% round(.,2)
                biomass.1 fall.vigor.av spring.vigor.av
biomass.1            1.00         -0.09            0.28
fall.vigor.av       -0.09          1.00            0.42
spring.vigor.av      0.28          0.42            1.00

Positive phenotypic correlation between biomass.1 and spring.vigor.av.

Wide model: matrix of responses

There are two ways to accomplish a multi-trait model in asreml: “wide” and “long” form.

Wide form poses the two phenotypes as different responses and your Y vector becomes a matrix with a column for each trait. Long form stacks the traits such that Y remains a column vector.

Let’s try modeling those two traits.

NOTE: You’ll see a special term “traits” appear. Like “units”, it is a special term using by asreml. It creates an internal factor corresponding to the columns in the response matrix (the traits).

# Make factors before modeling
alny_df<-alny_alt %>% 
  mutate(entry=as.factor(entry),
         site.year=as.factor(site.year),
         site=as.factor(site),
         # unless you are expecting / modeling a linear trend in year
         # convert it to factor
         year=as.factor(year),
         # explicitly nest values of rep in site.year
         # review explicit vs. implicit nesting
         rep=as.factor(paste0(site.year,"-",rep)),
         row=as.factor(row),
         range=as.factor(range)) %>% 
     # ORDER IS IMPORTANT HERE
  arrange(site.year,range,row)

# I started by fiddling with a simpler model
### notice the lack of residual structure
# alny_mod <- asreml(cbind(biomass.1,spring.vigor.av)~trait+trait:site.year, 
#                    random=~us(trait):entry + at(trait):rep,
#                    residual=~id(units):diag(trait),
#                    data=alny_df, maxiter=90,
#                    na.action = na.method(y = "include", x = "include"))
# After verifying this worked, I added back the AR1:AR1 design like so:
alny_mod <- asreml(cbind(biomass.1,spring.vigor.av)~trait+trait:site.year, 
                  random=~us(trait):entry + at(trait):rep,
                  residual=~dsum(~ar1(range):ar1(row):diag(trait)|site.year),
                  data=alny_df, maxiter=90,
                  na.action = na.method(y = "include", x = "include"))
ASReml Version 4.2 02/02/2026 11:50:42
          LogLik        Sigma2     DF     wall
 1     -2199.979           1.0    488   11:50:42  (  3 restrained)
 2     -1794.125           1.0    488   11:50:42  (  1 restrained)
 3     -1390.562           1.0    488   11:50:42
 4     -1106.224           1.0    488   11:50:42  (  1 restrained)
 5     -970.1781           1.0    488   11:50:42  (  1 restrained)
 6     -910.9503           1.0    488   11:50:42
 7     -884.2078           1.0    488   11:50:42
 8     -878.1670           1.0    488   11:50:42
 9     -877.1991           1.0    488   11:50:42  (  3 restrained)
10     -877.1026           1.0    488   11:50:42  (  3 restrained)
11     -877.0878           1.0    488   11:50:42  (  3 restrained)
12     -877.0831           1.0    488   11:50:42  (  3 restrained)
13     -877.0804           1.0    488   11:50:42  (  3 restrained)
14     -877.0784           1.0    488   11:50:42  (  3 restrained)
15     -877.0766           1.0    488   11:50:42  (  3 restrained)
16     -877.0751           1.0    488   11:50:42  (  3 restrained)
17     -877.0737           1.0    488   11:50:42  (  3 restrained)
18     -877.0725           1.0    488   11:50:42  (  3 restrained)
19     -877.0714           1.0    488   11:50:42  (  3 restrained)
20     -877.0705           1.0    488   11:50:42  (  3 restrained)
summary(alny_mod)$varcomp
                                                     component   std.error
at(trait, 'biomass.1'):rep                          1.91242174  2.46175862
at(trait, 'spring.vigor.av'):rep                    0.41595948  0.38948385
trait:entry!trait_biomass.1:biomass.1               2.08683064  1.26746356
trait:entry!trait_spring.vigor.av:biomass.1         1.62716420  0.56827658
trait:entry!trait_spring.vigor.av:spring.vigor.av   1.29652343  0.40546225
site.year_24AL!R                                    1.00000000          NA
site.year_24AL!range!cor                            0.18972669  0.10983805
site.year_24AL!row!cor                              0.39109275  0.09083933
site.year_24AL!trait_biomass.1                    224.05796078 46.38157188
site.year_24AL!trait_spring.vigor.av                4.19921596  0.97961339
site.year_24NY!R                                    1.00000000          NA
site.year_24NY!range!cor                           -0.18099805  0.11382558
site.year_24NY!row!cor                              0.05600982  0.10667289
site.year_24NY!trait_biomass.1                     16.08285080  3.40914404
site.year_24NY!trait_spring.vigor.av                3.37645175  0.69649884
site.year_25AL!R                                    1.00000000          NA
site.year_25AL!range!cor                           -0.02959075  0.11795822
site.year_25AL!row!cor                              0.33582993  0.09857263
site.year_25AL!trait_biomass.1                    197.05919532 40.93066682
site.year_25AL!trait_spring.vigor.av                1.64576776  0.39858289
site.year_25NY!R                                    1.00000000          NA
site.year_25NY!range!cor                            0.33616824  0.09459790
site.year_25NY!row!cor                              0.30649519  0.08845506
site.year_25NY!trait_biomass.1                     11.06523430  2.22217011
site.year_25NY!trait_spring.vigor.av                4.45881240  0.95744183
                                                     z.ratio bound %ch
at(trait, 'biomass.1'):rep                         0.7768518     P 0.0
at(trait, 'spring.vigor.av'):rep                   1.0679762     P 0.0
trait:entry!trait_biomass.1:biomass.1              1.6464620     ? 0.3
trait:entry!trait_spring.vigor.av:biomass.1        2.8633315     ? 0.1
trait:entry!trait_spring.vigor.av:spring.vigor.av  3.1976427     ? 0.1
site.year_24AL!R                                          NA     F 0.0
site.year_24AL!range!cor                           1.7273312     U 0.0
site.year_24AL!row!cor                             4.3053239     U 0.0
site.year_24AL!trait_biomass.1                     4.8307539     P 0.0
site.year_24AL!trait_spring.vigor.av               4.2866053     P 0.0
site.year_24NY!R                                          NA     F 0.0
site.year_24NY!range!cor                          -1.5901351     U 0.0
site.year_24NY!row!cor                             0.5250615     U 0.2
site.year_24NY!trait_biomass.1                     4.7175627     P 0.0
site.year_24NY!trait_spring.vigor.av               4.8477493     P 0.0
site.year_25AL!R                                          NA     F 0.0
site.year_25AL!range!cor                          -0.2508579     U 0.2
site.year_25AL!row!cor                             3.4069288     U 0.0
site.year_25AL!trait_biomass.1                     4.8144634     P 0.0
site.year_25AL!trait_spring.vigor.av               4.1290477     P 0.1
site.year_25NY!R                                          NA     F 0.0
site.year_25NY!range!cor                           3.5536545     U 0.0
site.year_25NY!row!cor                             3.4649821     U 0.1
site.year_25NY!trait_biomass.1                     4.9794722     P 0.0
site.year_25NY!trait_spring.vigor.av               4.6570060     P 0.0

What you should notice is that the variance and covariance estimates are all listed in a data.frame.

Unfortunately, I’m unaware of a convenience function that makes it into a simple var-covar matrix.

Here’s my hack:

# Extract only the estimates
g11<-summary(alny_mod)$varcomp %>% .[grepl("biomass.1:biomass.1",rownames(.)),"component"]
g12<-g21<-summary(alny_mod)$varcomp %>% .[grepl("spring.vigor.av:biomass.1",rownames(.)),"component"]
g22<-summary(alny_mod)$varcomp %>% .[grepl("spring.vigor.av:spring.vigor.av",rownames(.)),"component"]

# Manually construct a matrix
G<-matrix(c(g11,g12,g21,g22),
            2, 2, byrow=TRUE,
            dimnames=list(c("biomass.1","spring.vigor.av"),
                          c("biomass.1","spring.vigor.av")))
G
                biomass.1 spring.vigor.av
biomass.1        2.086831        1.627164
spring.vigor.av  1.627164        1.296523

Convert to correlations

cov2cor(G)
                biomass.1 spring.vigor.av
biomass.1       1.0000000       0.9892307
spring.vigor.av 0.9892307       1.0000000

The BLUPs can be extracted same as before.

They do require a bit of clean-up to their names.

The code below has a bunch of steps but all I’m doing is refactoring the output to be a nice looking data.frame.

blups<-summary(alny_mod,coef=T)$coef.random %>% 
  # confert to data.frame
  as.data.frame %>% 
  # change the rownames to a column the the data.frame
  rownames_to_column(var = "entry") %>% 
  # filter to only the BLUPs for the entry main-effect 
  # (output includes all random terms)
  filter(grepl("trait_",entry),
         grepl(":entry_",entry)) %>% 
  # Clean up the entry names by remove prefixes
  mutate(entry=gsub("trait_","",entry),
         entry=gsub("entry_","",entry)) %>% 
  # Separate the entry name, it contains both trait and entry names now
  separate(entry,c("Trait","entry"),sep = ":") %>% 
  select(Trait,entry,solution) %>% 
  # Convert from long to wide (put the trait BLUPs next to each other as columns)
  spread(Trait,solution)
blups %>% head
         entry  biomass.1 spring.vigor.av
1      17MDCCH  0.2290380       0.1682749
2      17MDCCS  0.6700303       0.5380645
3      17RMCCL -1.3385127      -1.0287830
4 18NCCCEGiant  0.5832913       0.4780943
5       19MDCC  1.3092683       1.0629740
6     19NCCCLH  0.1288877       0.1047911
blups %>% 
  ggplot(.,aes(x=spring.vigor.av,y=biomass.1)) + geom_point() + labs(title='BLUPs from Multivar Model')

alny_alt %>% 
  ggplot(.,aes(x=spring.vigor.av,y=biomass.1)) + geom_point()

Last thing I want to show here is model comparison.

You can’t directly compare a uni-variate to a multivariate model. They are different datasets, after all.

Instead, first fit a multi-trait model with the us(traits) variance structure. This estimates the genetic variance-covariances.

Next, fit the same model, but change to diag(traits). This is equivalent to model the two traits separately.

A LRT and/or AIC comparison can then be made.

alny_mod_diagtraits <- asreml(cbind(biomass.1,spring.vigor.av)~trait+trait:site.year, 
                              random=~diag(trait):entry + at(trait):rep,
                              residual=~dsum(~ar1(range):ar1(row):diag(trait)|site.year),
                              data=alny_df, maxiter=90,
                              na.action = na.method(y = "include", x = "include"))
ASReml Version 4.2 02/02/2026 11:50:42
          LogLik        Sigma2     DF     wall
 1     -2189.708           1.0    488   11:50:42  (  3 restrained)
 2     -1784.732           1.0    488   11:50:42
 3     -1387.722           1.0    488   11:50:42
 4     -1107.703           1.0    488   11:50:42  (  1 restrained)
 5     -973.2809           1.0    488   11:50:42
 6     -905.2485           1.0    488   11:50:42
 7     -886.2921           1.0    488   11:50:42
 8     -882.6019           1.0    488   11:50:42
 9     -882.1435           1.0    488   11:50:42
10     -882.0897           1.0    488   11:50:42
11     -882.0815           1.0    488   11:50:42
12     -882.0800           1.0    488   11:50:42
summary(alny_mod_diagtraits)$varcomp
                                         component  std.error      z.ratio
at(trait, 'biomass.1'):rep            1.711014e+00  2.3573054  0.725834613
at(trait, 'spring.vigor.av'):rep      4.182708e-01  0.3905696  1.070925268
trait:entry!trait_biomass.1           1.484542e+00  1.0618270  1.398101335
trait:entry!trait_spring.vigor.av     1.275245e+00  0.4038857  3.157440333
site.year_24AL!R                      1.000000e+00         NA           NA
site.year_24AL!range!cor              1.934024e-01  0.1119265  1.727941004
site.year_24AL!row!cor                4.160064e-01  0.0902496  4.609509889
site.year_24AL!trait_biomass.1        2.358932e+02 49.9376628  4.723752971
site.year_24AL!trait_spring.vigor.av  3.955565e+00  0.9331135  4.239103551
site.year_24NY!R                      1.000000e+00         NA           NA
site.year_24NY!range!cor             -2.024806e-01  0.1137123 -1.780639558
site.year_24NY!row!cor                7.290569e-04  0.1070799  0.006808531
site.year_24NY!trait_biomass.1        1.539425e+01  3.3366494  4.613686149
site.year_24NY!trait_spring.vigor.av  3.630845e+00  0.7505270  4.837727606
site.year_25AL!R                      1.000000e+00         NA           NA
site.year_25AL!range!cor             -2.124033e-02  0.1157803 -0.183453782
site.year_25AL!row!cor                3.054894e-01  0.1015944  3.006951174
site.year_25AL!trait_biomass.1        2.170321e+02 44.8714581  4.836752469
site.year_25AL!trait_spring.vigor.av  1.561906e+00  0.3842398  4.064924788
site.year_25NY!R                      1.000000e+00         NA           NA
site.year_25NY!range!cor              2.822685e-01  0.1037768  2.719957614
site.year_25NY!row!cor                4.109012e-01  0.0852862  4.817909476
site.year_25NY!trait_biomass.1        1.068695e+01  2.2224356  4.808666390
site.year_25NY!trait_spring.vigor.av  4.841211e+00  1.0831756  4.469461502
                                     bound  %ch
at(trait, 'biomass.1'):rep               P  0.9
at(trait, 'spring.vigor.av'):rep         P  0.1
trait:entry!trait_biomass.1              P  0.3
trait:entry!trait_spring.vigor.av        P  0.1
site.year_24AL!R                         F  0.0
site.year_24AL!range!cor                 U  0.0
site.year_24AL!row!cor                   U  0.0
site.year_24AL!trait_biomass.1           P  0.0
site.year_24AL!trait_spring.vigor.av     P  0.0
site.year_24NY!R                         F  0.0
site.year_24NY!range!cor                 U  0.0
site.year_24NY!row!cor                   U 21.6
site.year_24NY!trait_biomass.1           P  0.1
site.year_24NY!trait_spring.vigor.av     P  0.0
site.year_25AL!R                         F  0.0
site.year_25AL!range!cor                 U  0.5
site.year_25AL!row!cor                   U  0.1
site.year_25AL!trait_biomass.1           P  0.0
site.year_25AL!trait_spring.vigor.av     P  0.1
site.year_25NY!R                         F  0.0
site.year_25NY!range!cor                 U  0.5
site.year_25NY!row!cor                   U  0.3
site.year_25NY!trait_biomass.1           P  0.1
site.year_25NY!trait_spring.vigor.av     P  0.2
lrt(alny_mod_diagtraits,alny_mod)
Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)

                             df LR-statistic Pr(Chisq)    
alny_mod/alny_mod_diagtraits  1       10.019 0.0007746 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(alny_mod)$aic
[1] 1796.141
attr(,"parameters")
[1] 21
summary(alny_mod_diagtraits)$aic
[1] 1804.16
attr(,"parameters")
[1] 20

We have a clear result. Including the genetic covariance between traits leads to a significantly better model fit.

You can declare those covariance components different from zero.

Long model: vector of responses

alny_long<-alny_alt %>%
  pivot_longer(cols = c(biomass.1, spring.vigor.av),
               names_to  = "trait",
               values_to = "value",values_drop_na = T) %>%
  mutate(trait = factor(trait),
         entry=as.factor(entry),
         site.year=as.factor(site.year),
         site=as.factor(site),
         # unless you are expecting / modeling a linear trend in year
         # convert it to factor
         year=as.factor(year),
         # explicitly nest values of rep in site.year
         # review explicit vs. implicit nesting
         rep=as.factor(paste0(site.year,"-",rep)),
         row=as.factor(row),
         range=as.factor(range)) %>% 
   arrange(site.year, range, row, trait)
alny_long %>% select(site.year,entry,rep,row,range,trait,value) %>% head
# A tibble: 6 × 7
  site.year entry    rep         row   range trait           value
  <fct>     <fct>    <fct>       <fct> <fct> <fct>           <dbl>
1 24AL      Aldo     24AL-24AL-1 1     1     biomass.1        56.3
2 24AL      Aldo     24AL-24AL-1 1     1     spring.vigor.av   9  
3 24AL      21NCCCLH 24AL-24AL-1 2     1     biomass.1        46.4
4 24AL      21NCCCLH 24AL-24AL-1 2     1     spring.vigor.av   5  
5 24AL      23MDCC   24AL-24AL-1 3     1     biomass.1        55.8
6 24AL      23MDCC   24AL-24AL-1 3     1     spring.vigor.av   5  

Key points:

  • Each plot now appears twice (once per trait)
  • Traits are distinguished by the “trait” column
  • Missing values are retained as explicit rows with value = NA.
  • Ordering still matters for the AR1 terms
alny_long_mod <- asreml(
  fixed    = value ~ trait + trait:site.year,
  random   = ~ us(trait):entry + us(trait):rep,
  # residual = ~ dsum(~ar1(range):ar1(row):diag(trait) | site.year),
  data     = alny_long,
  maxiter  = 90,
  na.action = na.method(y="include", x="include"))
ASReml Version 4.2 02/02/2026 11:50:42
          LogLik        Sigma2     DF     wall
 1     -1243.737      49.90869    488   11:50:42  (  5 restrained)
 2     -1239.542      48.94557    488   11:50:42  (  4 restrained)
 3     -1231.443      49.06078    488   11:50:42  (  6 restrained)
 4     -1228.730      47.39486    488   11:50:42  (  4 restrained)
 5     -1228.725      47.38316    488   11:50:42  (  4 restrained)
 6     -1227.820      47.01267    488   11:50:42  (  4 restrained)
 7     -1227.818      47.01039    488   11:50:42  (  6 restrained)
 8     -1227.317      46.86020    488   11:50:42  (  4 restrained)
 9     -1227.316      46.85783    488   11:50:42  (  6 restrained)
10     -1227.006      46.77536    488   11:50:42  (  4 restrained)
11     -1227.006      46.77254    488   11:50:42  (  6 restrained)
12     -1226.800      46.72221    488   11:50:42  (  4 restrained)
13     -1226.800      46.71891    488   11:50:42  (  6 restrained)
14     -1226.656      46.68684    488   11:50:42  (  4 restrained)
15     -1226.656      46.68300    488   11:50:42  (  6 restrained)
16     -1226.551      46.66226    488   11:50:42  (  6 restrained)
17     -1226.551      46.65852    488   11:50:42  (  6 restrained)
18     -1226.473      46.64214    488   11:50:42  (  5 restrained)
19     -1226.428      46.79242    488   11:50:42  (  6 restrained)
20     -1226.365      46.63227    488   11:50:42  (  5 restrained)
21     -1226.317      46.78490    488   11:50:42  (  6 restrained)
22     -1226.272      46.61667    488   11:50:42  (  5 restrained)
23     -1226.220      46.77570    488   11:50:42  (  6 restrained)
24     -1226.188      46.60401    488   11:50:42  (  6 restrained)
25     -1226.178      46.58351    488   11:50:42  (  6 restrained)
26     -1226.169      46.58424    488   11:50:42  (  6 restrained)
27     -1226.161      46.57983    488   11:50:42  (  6 restrained)
28     -1226.154      46.57973    488   11:50:42  (  6 restrained)
29     -1226.147      46.57706    488   11:50:42  (  6 restrained)
30     -1226.141      46.57632    488   11:50:42  (  6 restrained)
31     -1226.135      46.57450    488   11:50:42  (  6 restrained)
32     -1226.130      46.57358    488   11:50:42  (  6 restrained)
33     -1226.125      46.57222    488   11:50:42  (  6 restrained)
34     -1226.120      46.57129    488   11:50:42  (  6 restrained)
35     -1226.116      46.57021    488   11:50:42  (  6 restrained)
36     -1226.111      46.56934    488   11:50:42  (  6 restrained)
37     -1226.107      46.56844    488   11:50:42  (  6 restrained)
38     -1226.104      46.56765    488   11:50:42  (  6 restrained)
39     -1226.100      46.56688    488   11:50:42  (  6 restrained)
40     -1226.097      46.56618    488   11:50:42  (  6 restrained)
41     -1226.094      46.56550    488   11:50:42  (  6 restrained)
42     -1226.091      46.56488    488   11:50:42  (  6 restrained)
43     -1226.088      46.56428    488   11:50:42  (  6 restrained)
44     -1226.086      46.56372    488   11:50:42  (  6 restrained)
45     -1226.083      46.56319    488   11:50:42  (  6 restrained)
46     -1226.081      46.56269    488   11:50:42  (  6 restrained)
47     -1226.079      46.56221    488   11:50:42  (  6 restrained)
48     -1226.077      46.56176    488   11:50:42  (  6 restrained)
49     -1226.075      46.56133    488   11:50:42  (  6 restrained)
50     -1226.073      46.56092    488   11:50:42  (  6 restrained)
51     -1226.071      46.56053    488   11:50:42  (  6 restrained)
52     -1226.069      46.56016    488   11:50:42  (  6 restrained)
53     -1226.067      46.55980    488   11:50:42  (  6 restrained)
54     -1226.066      46.55947    488   11:50:42  (  6 restrained)
55     -1226.064      46.55914    488   11:50:42  (  6 restrained)
56     -1226.063      46.55883    488   11:50:43  (  6 restrained)
57     -1226.061      46.55854    488   11:50:43  (  6 restrained)
58     -1226.060      46.55826    488   11:50:43  (  6 restrained)
59     -1226.058      46.55798    488   11:50:43  (  6 restrained)
60     -1226.057      46.55772    488   11:50:43  (  6 restrained)
61     -1226.056      46.55747    488   11:50:43  (  6 restrained)
62     -1226.055      46.55723    488   11:50:43  (  6 restrained)
63     -1226.054      46.55700    488   11:50:43  (  6 restrained)
64     -1226.052      46.55678    488   11:50:43  (  6 restrained)
65     -1226.051      46.55656    488   11:50:43  (  6 restrained)
66     -1226.050      46.55635    488   11:50:43  (  6 restrained)
67     -1226.049      46.55616    488   11:50:43  (  6 restrained)
68     -1226.048      46.55596    488   11:50:43  (  6 restrained)
69     -1226.047      46.55578    488   11:50:43  (  6 restrained)
70     -1226.046      46.55560    488   11:50:43  (  6 restrained)
71     -1226.046      46.55542    488   11:50:43  (  6 restrained)
72     -1226.045      46.55526    488   11:50:43  (  6 restrained)
73     -1226.044      46.55509    488   11:50:43  (  6 restrained)
74     -1226.043      46.55494    488   11:50:43  (  6 restrained)
75     -1226.042      46.55479    488   11:50:43  (  6 restrained)
76     -1226.042      46.55464    488   11:50:43  (  6 restrained)
77     -1226.041      46.55450    488   11:50:43  (  6 restrained)
78     -1226.040      46.55436    488   11:50:43  (  6 restrained)
79     -1226.039      46.55422    488   11:50:43  (  6 restrained)
80     -1226.039      46.55409    488   11:50:43  (  6 restrained)
81     -1226.038      46.55397    488   11:50:43  (  6 restrained)
82     -1226.037      46.55384    488   11:50:43  (  6 restrained)
83     -1226.037      46.55372    488   11:50:43  (  6 restrained)
84     -1226.036      46.55361    488   11:50:43  (  6 restrained)
85     -1226.036      46.55350    488   11:50:43  (  6 restrained)
86     -1226.035      46.55339    488   11:50:43  (  6 restrained)
87     -1226.034      46.55328    488   11:50:43  (  6 restrained)
88     -1226.034      46.55318    488   11:50:43  (  6 restrained)
89     -1226.033      46.55307    488   11:50:43  (  6 restrained)
90     -1226.033      46.55298    488   11:50:43  (  6 restrained)
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 1 times in iteration 1 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 2 times in iteration 3 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 1 times in iteration 5 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 2 times in iteration 7 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 2 times in iteration 9 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Log-likelihood not converged

Modeling this way isn’t going so well.

With the residual term specified, I received a warning that there were singularities and the model could not be solved.

Remove the residual term and we get the model to fit… but it says NOT CONVERGED.

Still, we are left with an output I can show you as an example:

summary(alny_long_mod)$varcomp
                                                    component std.error
trait:rep!trait_biomass.1:biomass.1                5.58549117  5.034742
trait:rep!trait_spring.vigor.av:biomass.1         -0.31930658  2.104452
trait:rep!trait_spring.vigor.av:spring.vigor.av    0.01977474  2.106323
trait:entry!trait_biomass.1:biomass.1             20.39086187  6.536167
trait:entry!trait_spring.vigor.av:biomass.1        3.05569762  3.448362
trait:entry!trait_spring.vigor.av:spring.vigor.av  0.46627054  6.900820
units!R                                           46.55297587  3.135684
                                                       z.ratio bound %ch
trait:rep!trait_biomass.1:biomass.1                1.109389841     ? 0.0
trait:rep!trait_spring.vigor.av:biomass.1         -0.151729116     ? 0.3
trait:rep!trait_spring.vigor.av:spring.vigor.av    0.009388277     ? 0.6
trait:entry!trait_biomass.1:biomass.1              3.119697201     ? 0.0
trait:entry!trait_spring.vigor.av:biomass.1        0.886130122     ? 0.1
trait:entry!trait_spring.vigor.av:spring.vigor.av  0.067567412     ? 0.2
units!R                                           14.846196185     P 0.0

Modelling Advice

NOTICE: Multivariate models are computationally challenging. Memory requirements for REML increases (I think) with the cube of the number of parameters. In other words, adding more traits very rapidly increases compute demand. REML sometimes will not converge to a global optimum. Preliminary analyses to estimate “starting values” for REML algorithms to use are often needed.

Some guidance:

  • First analyze single traits (univariate cases)
  • Make emprical estimates of the variances for each trait AND a univariate approach like the covariance among BLUPs to estimate covariances among traits.
  • Feed univariate estimates of variance and covariance into asreml or your favorite solver as “starting values” to increase the chance of convergence.
  • Build up step-by-step from single-trait to many/all traits in the analysis.
  • Monitor your system resources (%CPU, RAM usage) carefully.
  • Including Pedigree and/or Genomic information as relationship matrices in these models (topics we will cover later) add to the compute cost but can help make them solve-able.
  • Lastly, it can be helpful to center+scale all of the traits-to-be-analyzed. When confining all traits to the same units (standard deviations), REML may be better behaved.

Part II: Two-stage analysis

Section below is preparatory for your downstream goals (GWAS and GP).

Don’t BLUP twice!

We often have very large, multi-year, multi-location, multi-trial-type (MET) datasets we use for downstream genome-wide association and genomic prediction purposes. The number of plots can be in the range of 10- or even 100,000 plots with many thousands of unique genotypes observed in unbalanced fashion across heterogenous experimental designs. All that to say, the computational burden and level of expertise required to execute such analyses directly on these data is very great.

Because they focus on computational efficiency, many software packages used for GWAS (e.g. FarmCPU Liu et al. (2016)) limit the fixed and random-effects structures that can be modeled.

In contrast, packages like asreml and sommer can be used for GWAS, but are optimized to do so.

While a unified, single-step analysis is always the statistically best approach, computational burdens still often necessitate a “two-stage” analysis approach:

Stage 1: Model the raw data and obtain ‘summary values’ (at the level of genetic units, e.g. for each genotype in your population), adjusted for the complexities of experimental design.

Stage 2: Conduct inference or prediction analysis (e.g. GWAS) using the adjusted genotypic values from Stage 1 as the response.

Options for Stage 1:

  • Fit lines as random-effects ❌: shrinkage-based predictions account for potentially unbalanced data.
  • Fit lines as fixed-effects ✅: makes no assumption of a distribution, does not induce shrinkage accounting for unbalanced data.

Despite the benefits of BLUP for ranking and selection, you should NEVER BLUP TWICE (highly recommend you read Holland and Piepho (2024) also Garrick, Taylor, and Fernando (2009)).

Why not? In downstream GWAS and GP applications, we usually fit a random-effect for genotypes (lines in your population). We’ll talk about why and how later. The problem is, if your dependent-variable in the GWAS analysis is already a shrinkage-based prediction, you’ll shrink the distribution twice and lose at least some of the signal you are looking for.

“It is inappropriate to treat genetic effects as random in both stages, as this results in a BLUP analysis of BLUPs, hence the recommendation “don’t BLUP twice.” ~Holland & Piepho 2024

As depicted in the figure below, you’ll want to keep lines as fixed-effects, i.e. use BLUEs from Stage 1 for your GWAS/GP analyses.

From Holland & Piepho 2024: Two-stage analysis workflow for (A) single-environment trials and (B) multienvironment trials. In the first stage of a single-environment trial A), a model Yjk = µ + Gj + Bk + εjk is fit to the data with Gj line effects treated as fixed and Bk block effects treated as random. BLUEs of line values are estimated and used as dependent variables in a second-stage analysis testing each marker’s effect: = µ + xjpβp + Aj + ⁠, where xjp is a numeric indicator variable for allele dosage at marker p in line j, βp is the fixed effect of marker p, Aj is the polygenic background effect line j with covariances equal to the genomic relationship matrix G times the additive genetic variance (⁠ ⁠), and is the residual effect, distinguished from plot-level experimental error effects (εjk in stage 1), as it includes both estimation error of the line phenotypic mean value and genetic residual reflecting genotypic values not capture by the genomic relationship matrix. For multienvironment trials B), random terms for environment main effects (Ei) and genotype-by-environment interactions (GEij) are added to the first-stage model, but the second-stage model for marker main effects remains unchanged.

Weighted Analysis

Stage 2 models, without weighting, assume independent and identical errors among the adjusted means being used as the response. However, real and unbalanced data mean there is real information on error variance-covariance among the adjusted-genotype-means from Stage 1 being lost.

There are several ways of preserving that information, as “weights” applied to the residual in Stage 2 Damesa et al. (2017) Fernández-González and Isidro y Sánchez (2025).

I’ll show you an example of a weight scheme and how it works below.

De-regression

But first a digression on de-regression.

It is possible to fit a mixed-model with genotypes as random and then later un-shrink them. There are some good reasons to do so.

Deregression is an approach that is often used in animal breeding (Garrick, Taylor, and Fernando 2009; Holland and Piepho 2024; Isik, Holland, and Maltecca 2017). BLUPs get un-shrunken before submission to the next stage of analysis.

The de-regressed BLUP (drgBLUP) is simply the BLUP divided by a quantity called the reliability or r2r^2. It is very similar (possibly equivalent) to the BLUE.

The reliability is defined as: r2=1PEVσg2r^2=1−\frac{PEV}{\sigma^2_g}

So drgBLUP=BLUPr2drgBLUP = \frac{BLUP}{{r^2}}

Recall that reliability ranges from 0 to 1 and measures the certainty surrounding each BLUP value; or rather, the probability the BLUP would change if another experiment (data point) were added. So if r2=1r^2=1, then we have essentially zero expected error for a BLUP. For example, check varieties often have so many observations in a dataset that they will have high reliability.

So if a de-regressed BLUP is defined as BLUPr2\frac{BLUP}{r^2} you can see that for clones with high reliability, the BLUP will stay essentially the same, but if the reliability is low, the BLUP will be un-shrink or de-regress strongly. This is going to happen because genotypes with few observations and thus low reliability will be more strongly shrunken to zero during the first step.

Below is a density plot showing the distribution of the original data compared to the BLUP and the drgBLUP. This is meant to show how shrinkage and unshrinkage effect the data. Since BLUPs and drgBLUPs are centered on the mean, I added back the grand mean for comparison to the original data.

Comparing BLUP vs. De-regressed BLUP for two highest (top) and lowest (bottom) reliability genotypes in a dataset of Cassava clones.

Weighted 2nd stage

Regardless of whether BLUE or drg-BLUP are used in the 2nd stage, if data are unbalanced and/or come from complex data, then the precision of BLUE/drgBLUP is likely to vary. It has been shown to improve accuracy in the 2nd stage to include a weight to the R-structure (residual var-covar) according to results of the first stage.

The best way to do that accounts for both variances and covariances among the responses in the second stage; but it is very computationally intensive (Möhring and Piepho 2009; Damesa et al. 2017). Instead, “diagonal” approaches are easier. The R structure in stage 2 includes wIσe2wI\sigma^2_e. Where w is a n×1n \times 1 vector of “weights”. There are different options for weights.

Garrick, Taylor, and Fernando (2009) recommended weight is:

WT=1H20.1+1r2rr×H2 WT = \frac{1-H^{2}}{0.1+\frac{1-r^{2}}{r^{r}} \times H^{2}}

H2H^2 is broad-sense heritability.

r2r^2 is the reliability.

I made a plot showing how WT varies across a range of H2 and REL. This is to show how the weighting behaves. As we see, WT scales with REL and H2. H2 conditions the overall variability among individuals in their weight in the analysis. This works such that if heritability is low, there is a greater relative premium (weight) placed on individuals with high quality data (high REL), usually those with high numbers of observations, e.g. check-genotypes. If heritability is high, WTs are low and there isn’t much difference between low and high REL.

Here’s a plot that shows from the actual blups, how the WT scales with the number of observations-per-genotype.

FINAL NOTE ON WEIGHTED ANALYSIS: This is an area that deserves consideration and is actively under discussion / development in the literature. Check-out this very recent article Fernández-González and Isidro y Sánchez (2025).

Stage-wise analysis presented by Gonzalez and Sanchez, 2025. The single-stage model is on top, with the different types of two-stage models below it. The stage 1 model is common to all types of two-stage models, which only differ on how the stage 2 model accounts for the stage 1 estimation error variance-covariance.

Example

We didn’t cover the downstream applications yet. For that reason, you’ll forgive me for saving an example for once we’re doing Stage 2.

Hands-on

Now it’s your turn!

Our goal remains to jointly analyze the entire ALT dataset and to produce estimates of genetic parameters (heritability) and rankings of the entries (BLUPs).

For this lesson’s activity, you have two added objectives:

  1. Determine whether and what spatial models can be fit to the dataset.
  2. Determine whether there are significant genetic correlations between at least a pair of traits in your data chunk.

Use appropriate model comparison procedures to determine the optimal fit for your data.

As before, DATA OPTIONS:

  1. Use your previous data chunk (important it includes multiple site-years)
  2. Pick a bigger, more complex chunk
  3. For the very bold: try analyzing the entire dataset.

References

Animal Breeding Plans.Jay L. Lush.” 1946. The Quarterly Review of Biology 21 (3): 292–93. https://doi.org/10.1086/395356.
Damesa, Tigist Mideksa, Jens Möhring, Mosisa Worku, and Hans-Peter Piepho. 2017. “One Step at a Time: Stage-Wise Analysis of a Series of Experiments.” Agronomy Journal 109 (3): 845–57. https://doi.org/10.2134/agronj2016.07.0395.
Fernández-González, Javier, and Julio Isidro y Sánchez. 2025. “Optimizing Fully-Efficient Two-Stage Models for Genomic Selection Using Open-Source Software.” Plant Methods 21 (1). https://doi.org/10.1186/s13007-024-01318-9.
Garrick, Dorian J, Jeremy F Taylor, and Rohan L Fernando. 2009. “Deregressing Estimated Breeding Values and Weighting Information for Genomic Regression Analyses.” Genetics Selection Evolution 41 (1). https://doi.org/10.1186/1297-9686-41-55.
Hill, William G, and Trudy F C Mackay. 2004. “D. S. Falconer and Introduction to Quantitative Genetics.” Genetics 167 (4): 1529–36. https://doi.org/10.1093/genetics/167.4.1529.
Holland, James B, and Hans-Peter Piepho. 2024. “Don’t BLUP Twice.” Edited by A Lipka. G3: Genes, Genomes, Genetics, November. https://doi.org/10.1093/g3journal/jkae250.
Isik, Fikret, James Holland, and Christian Maltecca. 2017. Genetic Data Analysis for Plant and Animal Breeding. Springer International Publishing. https://doi.org/10.1007/978-3-319-55177-7.
Lande, Russell, and Stevan J. Arnold. 1983. “THE MEASUREMENT OF SELECTION ON CORRELATED CHARACTERS.” Evolution 37 (6): 1210–26. https://doi.org/10.1111/j.1558-5646.1983.tb00236.x.
Liu, Xiaolei, Meng Huang, Bin Fan, Edward S. Buckler, and Zhiwu Zhang. 2016. “Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies.” Edited by Jennifer Listgarten. PLOS Genetics 12 (2): e1005767. https://doi.org/10.1371/journal.pgen.1005767.
Möhring, J., and H.-P. Piepho. 2009. “Comparison of Weighting in Two-Stage Analysis of Plant Breeding Trials.” Crop Science 49 (6): 1977–88. https://doi.org/10.2135/cropsci2009.02.0083.
Rodríguez-Álvarez, María Xosé, Martin P. Boer, Fred A. van Eeuwijk, and Paul H. C. Eilers. 2018. “Correcting for Spatial Heterogeneity in Plant Breeding Experiments with P-Splines.” Spatial Statistics 23 (March): 52–71. https://doi.org/10.1016/j.spasta.2017.10.003.